[observer] storage bounds: drop min/max columns + cap points per series#50449
Draft
Eokye wants to merge 4 commits into
Draft
[observer] storage bounds: drop min/max columns + cap points per series#50449Eokye wants to merge 4 commits into
Eokye wants to merge 4 commits into
Conversation
Two changes to bound storage memory: 1. Remove mins/maxes columnar arrays from seriesStats. No active detector uses AggregateMin or AggregateMax — BOCPD, ScanWelch, ScanMW, and RRCF all use Average+Count only. Saves 16 bytes per point (40→24). The aggregate constants remain for API compatibility; they return 0. 2. Add maxPointsPerSeries (default 600 in live mode, unlimited in tests). When a new point would exceed the cap, the oldest points are trimmed from the front of the columnar arrays. At 1Hz this retains 10 minutes of history — sufficient for ScanWelch (MinPoints=30) and BOCPD (purely incremental, doesn't re-read history). Prevents unbounded growth in long-running agents.
Contributor
Gitlab CI Configuration Changes
|
| Removed | Modified | Added | Renamed |
|---|---|---|---|
| 0 | 361 | 0 | 0 |
Updated: .gitlab/distribution.yml
Changes Summary
| Removed | Modified | Added | Renamed |
|---|---|---|---|
| 0 | 0 | 2 | 0 |
ℹ️ Diff available in the job log.
Contributor
Go Package Import DifferencesBaseline: e5b320d
|
Contributor
Static quality checks❌ Please find below the results from static quality gates Error
Gate failure full details
Static quality gates prevent the PR to merge! Successful checksInfo
On-wire sizes (compressed)
|
Re-slicing (s[trim:]) keeps the original backing array alive because the new slice still references into it. Allocate fresh right-sized slices and copy the live portion so the old arrays become unreferenced and the GC can reclaim them.
Series with identical tag sets now share a single canonical []string backed by a hash-keyed intern pool (fnv64a over sorted tags). This avoids one []string allocation per new series for each repeated tag combination. Design: - tagIntern map[uint64]*tagInternEntry — keyed by fnv64a hash of the sorted tag combination; value holds the canonical []string and a ref count of live series referencing it. - seriesStats.tagsHash uint64 — precomputed at creation (0 = not interned) for O(1) pool release on eviction. - internTags: sorts, hashes, looks up pool. Hit: increment count, return canonical slice. Miss + under cap: insert new entry. Collision or cap exceeded: return sorted copy with hash=0 (no sharing). - releaseTagIntern: called from RemoveSeriesByKeys before the series is deleted; decrements count and removes the pool entry when it hits 0. - Cap at 4096 entries (matches dogstatsd_string_interner_size default) to bound pool size under high-cardinality DogStatsD input. Tests: PoolGrows, SharedSlice (unsafe.SliceData pointer identity), Eviction (ref count cleanup on RemoveSeriesByKeys), NilAndEmptyTags (sentinel invariant), UnsortedTagsShareEntry (sort normalization), Cap (4096 limit enforcement).
…1) eviction Replace the three-hop enrichment chain (storageKey string → seriesContextRef → GetContextByKey string → MetricContext) with direct O(1) ref-keyed lookup. Hot-path wins: - LogMetricsExtractor: eliminates metricContextKey string alloc + patternContext map write per log. Context now inline in MetricOutput; engine stores it keyed by SeriesRef on first series creation. - enrichAnomaly: no seriesKey rebuild, no contextProviders lookup, no GetContextByKey string map scan — just contextByRef[ref]. - removeContextsForEvictedKeys: O(N) contextRefs scan → O(1) lookup via contextKeyToStorageKey map[nsContextKey]string. Engine changes: - contextRefs map[string]seriesContextRef removed. - contextProviders map[string]ContextProvider removed. - contextByRef map[SeriesRef]*MetricContext: updated on new series creation. - contextKeyToStorageKey map[nsContextKey]string: O(1) eviction lookup. - fanOutSeriesRemoval: also deletes from contextByRef. - enrichAnomaly: ref-based lookup with Source fallback for bare-detector anomalies. def/component.go: MetricOutput gains Context *MetricContext; ContextProvider interface removed. storage.go: AddResult gains Ref SeriesRef (-1 for dropped points). context_provider.go: collectContextProviders removed. LogMetricsExtractor: patternContext, GetContextByKey, metricContextKey removed. LogPatternExtractor: keeps GetContextByKey for live cluster pattern queries; logPatternExtractorContext.byKey removed (context now in MetricOutput.Context, snapshotted at first-ingest time).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on #50395. Four changes to reduce observer memory and CPU:
1. Drop unused min/max columns
Remove
mins/maxesfromseriesStats— no active detector usesAggregateMin/AggregateMax. Saves 16 bytes per point (40→24 bytes).2. Cap points per series (
maxPointsPerSeries = 600)Trim oldest points when a series exceeds the cap. At 1Hz: 10 minutes of history, 20× headroom for ScanWelch's
MinPoints=30. Prevents unbounded columnar growth.3. Tag combination interning
Series with identical tag sets share one canonical
[]stringvia an fnv64a-hashed intern pool. Ref-counted: pool entries are freed when the last referencing series is evicted.4. Eliminate
contextRefsstring chainReplace the three-hop enrichment path (storageKey rebuild →
contextRefsscan →GetContextByKey) with direct ref-keyed lookups.contextByRef map[SeriesRef]*MetricContext— O(1) anomaly enrichmentcontextKeyToStorageKey map[nsContextKey]string— O(1) eviction (was O(N) scan)LogMetricsExtractor:metricContextKeystring alloc +patternContextmap write eliminated from hot pathAddResultgainsRef SeriesRef;ContextProviderinterface removedSMP results
Baseline: 7.78.0. Experiment:
observer_logs_anomaly_stress.Commit 1: drop min/max + cap points —
f112ca04Run
8fd945f0: memory 699 MiB, cpu 143 ≤ 500Commit 2: copy-on-trim —
d9106dc8Run
8601e3b2: memory 726 MiB, cpu 145 ≤ 500Reference (no storage bounds) —
1a99bc9bRun
e862928e: memory 631 MiB, cpu 135 ≤ 500Test plan
dda inv test --targets=./comp/observer/impl/...